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Abstract 

Critical slowing down associated with the iterative solvers close to the critical point often 
hinders large-scale numerical simulation of fracture using discrete lattice networks. This 
paper presents a block circlant preconditioner for iterative solvers for the simulation of pro- 
gressive fracture in disordered, quasi-brittle materials using large discrete lattice networks. 
The average computational cost of the present alorithm per iteration is 0{rs log s)+delops, 
where the stiffness matrix A is partioned into r-by-r blocks such that each block is an s- 
by-s matrix, and delops represents the operational count associated with solving a block- 
diagonal matrix with r-by-r dense matrix blocks. This algorithm using the block circu- 
lant preconditioner is faster than the Fourier accelerated preconditioned conjugate gradient 
(PCG) algorithm, and alleviates the critical slowing down that is especially severe close to 
the critical point. Numerical results using random resistor networks substantiate the effi- 
ciency of the present algorithm. 
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1 Introduction 



Progressive damage evolution leading to failure of disordered quasi-brittle mate- 
rials has been studied extensively using various types of discrete lattice models 

flaaaaaai. 

Numerical simulation of large lattice networks has often 
been hampered due to critical slowing down associated with the iterative solvers as 
the lattice system approaches macroscopic fracture. The authors have developed a 
multiple-rank sparse Cholesky update algorithm based on direct solvers for simu- 
lating fracture using discrete lattice systems [9]. Using the algorithm presented in 
ii, the authors have reported numerical simulation results for large 2D lattice sys- 
tems (e.g., L = 512), which to the authors knowledge, was so far the largest lattice 
system used in studying damage evolution using discrete lattice systems. Although 
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the sparse direct solvers presented in [9] are superior to iterative solvers in two- 
dimensional lattice systems, for 3D lattice systems, the memory demands brought 
about by the amount of fill-in during sparse Cholesky factorization favor iterative 
solvers. Hence, iterative solvers are in common use for large-scale 3D lattice sim- 
ulations. As the lattice system gets closer to macroscopic fracture, the condition 
number of the system of linear equations increases, thereby increasing the number 
of iterations required to attain a fixed accuracy. This becomes particularly signifi- 
cant for large lattices. Fourier accelerated PCG iterative solvers [10,[ll|,[l2|] have 
been used in the past to alleviate the critical slowing down. However, the Fourier ac- 
celeration technique based on ensemble averaged circulant preconditioner is not ef- 
fective when fracture simulation is performed using central-force and bond-bending 
lattice models 111 ill . The main focus of the current paper is on developing an effi- 
cient algorithm based on iterative solvers for large-scale 3D lattice simulations, 
and the block-circulant preconditioner presented in the current paper is an effort 
towards this goal. 



Since the Laplacian operator on a discrete lattice network results in the block struc- 
ture of the stiffness matrix, we propose to use block circulant matrices H 1 3L 1 1 411 as 
preconditioners to the stiffness matrix for solving this class of problems. The pro- 
posed algorithm is benchmarked against the commonly used incomplete LU and 
Cholesky preconditioners Hill , and the optimal 1 3, 17, 18, 14] and superoptimal 
I T9L 14 1 circulant preconditioners to the Laplacian operator (Kirchhoff equations). 
The advantage of using the circulant preconditioners is that they can be diagonal- 
ized by discrete Fourier matrices, and hence the inversion of Udof-hy-Udof circulant 
matrix can be done in O(n do f log n do f) operations by using FFTs of size n do f. In 
addition, since the convergence rate of the PCG method depends on the condition 
number of the preconditioned system, it is possible to choose a circulant precondi- 
tioner that minimizes the condition number of the preconditioned system ill9lll4ll . 
Furthermore, these circulant preconditioned systems exhibit favorable clustering 
of eigenvalues. In general, the more clustered the eigenvalues are, the faster the 
convergence rate is. Another important property of these circulant preconditioners 
proposed in this study is that they are positive definite if the stiffness matrix itself is 
positive definite. In this regard, we note that the Fourier accelerated PCG presented 
in ifioi [III H2I1 is not optimal in the sense described in lfl^[l7[ll8[[l4ll . and hence 
is expected to take more number of CG iterations compared with the optimal and 
superoptimal circulant preconditioners. 



In this paper, we analyze a random threshold model problem, where a lattice con- 
sists of fuses having the same conductance, but the bond breaking thresholds, i c , 
are based on a broad (uniform) probability distribution, which is constant between 
and 1. This relatively simple model has been extensively used in the literature 
for simulating the fracture and progressive damage evolution in brittle materials, 
and provides a meaningful benchmark for comparing different algorithms. A broad 
thresholds distribution represents large disorder and exhibits diffusive damage lead- 
ing to progressive localization, whereas a very narrow thresholds distribution ex- 
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hibits brittle failure in which a single crack propagation causes material failure. Pe- 
riodic boundary conditions are imposed in the horizontal direction to simulate an 
infinite system and a constant voltage difference (displacement) is applied between 
the top and the bottom of lattice system. The simulation is initiated with a triangular 
lattice of intact fuses of size L x L, in which disorder is introduced through random 
breaking thresholds. The voltage V across the lattice system is increased until a 
fuse (bond breaking) burns out. The burning of a fuse occurs whenever the electri- 
cal current (stress) in the fuse (bond) exceeds the breaking threshold current (stress) 
value of the fuse. The current is redistributed instantaneously after a fuse is burnt. 
The voltage is then gradually increased until a second fuse is burnt, and the process 
is repeated. Each time a fuse is removed, the electrical current is redistributed and 
hence it is necessary to re-solve Kirchhoff equations to determine the current flow- 
ing in the remaining bonds of the lattice. This step is essential for determining the 
fuse that is going to burn up under the redistributed currents. Therefore, numerical 
simulations leading to final breaking of lattice system network are very time con- 
suming especially with increasing lattice system size. Consequently, an efficient 
preconditioner to the Laplacian operator on fractal networks that mitigates the ef- 
fect of critical slowing down as the lattice system approaches macroscopic fracture 
is of utmost importance in the numerical simualtion of material breakdown. 

In the following, we present point-circulant and block circulant preconditioners for 
solving the linear system of equations that arise during the numerical simulation of 
progressive fracture in brittle materials using the random threshold model. 



2 Circulant Preconditioners for CG Iterative Solvers 

Consider the ndof X ndof stiffness matrix A. The optimal circulant preconditioner 
c(A) [16] is defined as the minimizer of ||C — A\\p over all ndof x ridnf circulant 
matrices C. In the above description, \\-\\ F denotes the Frobenius norm [15], and the 
matrix c(A) is called an optimal circulant preconditioner because it minimizes the 
norm ||C — A\\p. The optimal circulant preconditioner c( A) is uniquely determined 
by A, and is given by 

c(A)=F*5(FAF*)F (1) 

where F denotes the discrete Fourier matrix, 5 (A) denotes the diagonal matrix 
whose diagonal is equal to the diagonal of the matrix A, and * denotes the adjoint 
(i.e. conjugate transpose). It should be noted that the diagonal elements of the ma- 
trix 5 (FAF*) represent the eigenvalues of the matrix c(A) and can be obtained in 
0{jidof log ndof) operations by taking the FFT of the first column of c(A). The first 
column vector of T. Chan's optimal circulant preconditioner matrix that minimizes 
the norm ||C — A\\ F is given by 
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1 n dof 

Z~~ zl a j,(j-i+i) mod 

lidof j = l 



ndof 



(2) 



The above formula can be interpreted simply as follows: the element q is simply 
the arithmetic average of those diagonal elements of A extended to length ndof by 
wrapping around and containing the element a^i. If the matrix A is a Hermitian 
matrix, then the eigenvalues of c( A) are bounded from below and above by 

(c(A)) < X max (c(A)) < X max (A) (3) 



where A m j n (-) and \ ma x(-) denote the minimum and maximum eigenvalues, re- 
spectively. Based on the above result, if the matrix A is positive definite, then the 
circulant preconditioner c(A) is also positive definite. In particular, if the circu- 
lant preconditioner is such that the spectra of the preconditioned system is clus- 
tered around 1, then the convergence of the solution will be fast. The superoptimal 
circulant preconditioner t(A) 

El is based on the idea of minimizing the norm 
|| I — C _1 A||i? over all nonsingular circulant matrices C. In the above description, 
t(A) is superoptimal in the sense that it minimizes ||I — C _1 A|| F , and is equal to 

i(A) = c(AA*)c(A)- 1 (4) 



The preconditioner obtained by Eq. (4) is also positive definite if the matrix A itself 
is positive definite. Although the preconditioner t(A) is obtained by minimizing 
the norm ||I — C _1 A|| F , the asymptotic convergence of the preconditioned system 
is same as c(A) for large n do f system. Hence, in this study, we limit ourselves 
to the investigation of preconditioned systems using c(A) given by Eq. (2). The 
computational cost associated with the solution of preconditioned system c(A)z = 
r is the initialization cost of nnz(A) for setting the first column of c(A) using Eq. 
(2) during the first iteration, and 0(nd j log ndof) during every iteration step. 

In order to distinguish the block circulant preconditioners that follow from the 
above described circulant preconditioners, we refer henceforth to the above pre- 
conditioners as point-circulant preconditioners. 



2. 1 Block-circulant preconditioners 



Let the matrix A is pardoned into r-by-r blocks such that each block is an s-by-s 
matrix. That is, n do f — rs, and 
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A 



A 1A A 1>2 ■ • • Ai )P 

A-2,1 A 2 ,2 ■ ■ ■ A 2>r 



(5) 



Although the point-circulant preconditioner c(A) defined by Eq. (2) can be used 
as a preconditioner, in general, the block structure is not restored by using c( A) as 
a preconditioner. In contrast, the circulant-block preconditioners obtained by using 
circulant approximations for each of the blocks restore the block structure of A. 
The circulant-block preconditioner of A can be expressed as 



Cfl(A) 



c(A 1)1 ) c(A 1)2 ) ■ ■ ■ c(A ljr ) 
c(A 2 ,i) c(A 2j2 ) • ■ ■ c(A 2 , r ) 



c(A r , 1 ) c(A 



r,2 ■ ■ ■ 



c( A rjr ) 



(6) 



The circulant-block preconditioner defined by Eq. (6) is the minimizer of 1 1 C — A || p 
over all matrices C that are r-by-r block matrices with s-by-s circulant blocks. 
The spectral properties as given by Eq. (3) for point-circulant preconditioners also 
extend to the circulant-block preconditioners II 1 3L 1 1 -4fl . That is, 



A, 



i (A) <J A m j n (c s (A)) < A maa; (cB(A)) < A 



(7) 



In particular, if the matrix A is positive definite, then the block-preconditioner 
c B (A) is also positive definite. 



The computational cost associated with the circulant-block preconditioners can be 
estimated as follows. Since the stiffness matrix A is real symmetric for the type of 
problems considered in this study, in the following, we assume block symmetric 
structure for A, i.e., A^j = Ajj. In forming the circulant-block preconditioner 
given by Eq. (6), it is necessary to obtain point-circulant preconditioners for each 
of the r-by-r block matrices of order s. Point-circulant approximation for each of 
the s-by-s blocks requires 0(s log s) operations. This cost is in addition to the 
cost associated in forming the first column vectors (Eq. (2)) for each of the c(Ajj) 
blocks, which is given by nnz(A) operations. Since there are (r(r + l))/2 blocks, 
we need 0(r 2 s log s) operations to form 
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A = (I<g>F)c B (A)(I<8>F*) 



S (FA 1)1 F*) 5 (FA lj2 F*) • • • 5 (FA ljT .F*) 
S (FA 2 ,iF*) 5 (FA 2j2 F*) ■■■ 5 (FA 2>r F* 



0) 



5(FA ril F*) 5(FA r , 2 F*) ••• 5(FA r , r F*) 



In the above equation, ® refers to the Kronecker tensor product and I is an r-by-r 
identity matrix. In order to solve the preconditioned equation c B (A)z = r, the Eq. 
(8) is permuted to obtain a block-diagonal matrix of the form 



A = P*AP = 



A M ••• 
A 2j2 ••• 

••• A,. 



(9) 



where P is the permutation matrix such that 



Hj 



[5(FA hl F*)] kk Vl<i,j<r, 1 < k < s 



(10) 



During each iteration step, in order to solve the preconditioned system c B (A)z = r, 
it is necessary to invert the block-diagonal matrix A. This task can be performed 
by first factorizing each of the A^ blocks during the first iteration step, and then 
subsequently using these factored matrices to do the baclsolve operations. Hence, 
without considering the first factorizing cost of each of the block diagonals, during 
each iteration step, the number of operations involving the inversion of A is 



delops = 0(Y,\^A kk \) (ID 
k=i 

where C^ k denotes the number of non-zeros in the Cholesky factorization of the 

matrix A fe>fc . Therefore, the system cg(A)z = r can be solved in 0(rs log s) + 
delops operations per iteration step. Thus, we conclude that for the circulant-block 
preconditioner, the initialization cost is nnz(A) + 0(r 2 s log s) plus the cost as- 
sociated with the factorization of each of the diagonal blocks A fc jfc during the first 
iteration, and 0(rs log s) + delops during every iteration step. 

Although from operational cost per iteration point of view, the point-circulant pre- 
conditioner may prove advantageous for some problems, it is not clear whether 
point-circulant or circulant-block is closest to the matrix A in terms of the num- 
ber of CG iterations necessary for convergence. Hence, we investigate both point- 
circulant and circulant-block preconditioners in obtaining the solution of the linear 
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system Ax = b using iterative techniques. In addition, we also employ the com- 
monly used point and block versions of the incomplete LU preconditioners to solve 
the linear system Ax = b. 

REMARK 1: In the case of 2D discrete lattice network with periodic boundary 
conditions in the horizontal direction and a constant voltage difference between the 
top and the bottom of the lattice network, the matrix A is a block tri-diagonal real 
symmetric matrix. Under these circumstances, the initialization cost is nnz(A) + 
0(rs log s). Since each of the diagonal blocks A^ is a 2 x 2 matrix, during 
each iteration step, the solution involving the inversion of A can be obtained in 
O(s) operations. Thus, the cost per iteration is 0(rs log s) + O(s) = 0(rs log s) 
operations. The total computational cost involved in using the circulant-block pre- 
conditioner for a symmetric block tri-diagonal matrix is the initialization cost of 
nnz(A) + 0(rs log s), and 0(rs log s) operations per iteration step. This is signif- 
icantly less than the computational cost involved in using a generic circulant-block 
preconditioner. It should be noted that the block tri-diagonal structure of A does 
not change the computational cost associated with using a point-circulant precon- 
ditioner to solve the linear system Ax = b. 



3 Numerical Simulation Results 



In the following, we benchmark the proposed block circulant preconditioner against 
the optimal | la, 12 , H> 14 ] circulant preconditioner used for the Laplacian oper- 
ator (Kirchhoff equations). The main purpose behind the 2D lattice simulations 
presented below is to demonstrate the efficiency of block-circulant preconditioner 
over the optimal circulant preconditioner for the iterative solvers. Once again, we 
note that the type of ensemble-averaged circulant preconditioner presented in 
lll,[l2|] is not optimal in the sense described in II16LI17LI181I14Ii . and hence is ex- 



pected to take more number of CG iterations compared with the optimal circulant 
preconditioners. In the case of 2D lattice systems, we also present the simulation 
results using Solver type A of the Ref. [9] based on sparse direct solvers. As noted 
earlier, the sparse direct solvers presented in ||9j] are superior to the iterative solvers 
for 2D lattice systems, even with the block-circulant preconditioner presented in the 
current paper. However, the main advantage of the block-circulant preconditioner 
using iterative solvers is in the case of simulation of 3D lattice systems, where the 
usage of sparse direct solvers is limited by the (random access) memory constraints. 

The numerical results presented in Tables 1-5 (for 2D lattices) and 7-10 (for 3D 
lattices) are performed on a single processor of Cheetah (27 Regatta nodes with 
thirty two 1.3 GHz Power4 processors each, http://www.ccs.ornl.gov). However, 
the numerical simulation results presented in Tables 6 (for 2D lattices) and 1 1 (for 
3D lattices) are performed on a single processor of Eagle (184 nodes with four 375 
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MHz Power3-II processors) supercomputer at the Oak Ridge National Laboratory 
to run simulations simultaneously on more number of processors. In all of the it- 
erative schemes presented below, we employ a residual tolerance of e = 10~ 12 for 
convergence of the iterations. Tables 1 and 2 present the cpu and wall-clock times 
taken on a single processor of Cheetah for one configuration (simulation) using the 
block circulant and the optimal circulant precondioned CG iterative solvers, respec- 
tively. In the case of two-dimensional block circulant PCG, we partition the matrix 
A into L-by-L blocks such that each block is a (L + 1) x (L + 1) matrix. For com- 
parison purposes, we also present in Tables 3 and 4, the cpu and wall-clock times 
taken by un-preconditioned and incomplete Cholesky preconditioned CG solvers. 
Table 5 presents the performance of the sparse direct solver (Solver type a) reported 
in |0]. As discussed earlier, for 2D lattice systems, the sparse direct solvers and the 
incomplete Cholesky preconditioner are clearly superior to the block-circulant pre- 
conditioned CG iterative solver. However, this advantage of direct solvers (or the 
preconditioners such as incomplete Cholesky based on direct solvers) vanishes for 
large 3D lattice systems due to the amount of fill-in during Cholesky factorization. 
In tables 1-5, N con f ig indicates the number of configurations over which ensem- 
ble averaging of the numerical results is performed, and the number of iterations 
denote the average number of total iterations taken to break one intact lattice con- 
figuration until it falls apart. For some iterative solvers, the simulations for larger 
lattice systems were not performed either because they were expected to take larger 
cpu times or the numerical results do not influence the conclusions drawn in this 
study. In Table 6, we present the average number of bonds broken at the peak load 
and at failure per lattice (triangular) configuration. It should also be noted that in 
Table 6, we were able to perform emsemble averaging over many number of con- 
figurations because we were able to run these simulations simultaneously on many 
number of Eagle 375 MHz Power3-II processors. 

In addition to the above presented simulations on two-dimensional (2D) triangu- 
lar lattices, we have also carried out simulations on three-dimensional (3D) cubic 
lattice networks to investigate the efficiency of block circulant PCG solvers in 3D 
simulations. Figure 1 presents the snapshots of progressive damage evolution for 
the case of a broadly distributed random thresholds model problem in a cubic lattice 
system of size L = 48. The spanning cluster is shown in Fig. 2. Tables 7-10 present 
the cpu and wall-clock times taken on a single processor of Cheetah for simulating 
one-configuration using the block circulant, optimal circulant, un-preconditioned, 
and the incomplete Cholesky iterative solvers, respectively. It should be noted that 
for large 3D lattice systems (e.g., L = 32), the performance of incomplete Cholesky 
preconditioner (see Table 10) is similar to that of block-circulant preconditioner 
(see Table 7), even though the performance of incomplete Cholesky preconditioner 
is far more superior in the case of 2D lattice simulations. The memory limitations 
severely restricted the use of sparse direct solvers for simulating large 3D lattice 
systems, and hence the results corresponding to the direct solver for 3D lattice sys- 
tems are not presented. In the case of block circulant PCG, we once again partition 
the matrix A into L-by-L blocks of size (L + l) 2 x [L + l) 2 matrices. It should 
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be noted that in order to get maximum efficiency using the block circulant PCG 
solver, it is possible to further partition each of the (L + l) 2 x (L + l) 2 matrix 
blocks into (L + 1) x [L + 1) blocks of matrices of size (L + 1) x (L + 1). How- 
ever, the results presented in this study do not perform such nested block circulant 
precondioning. Table 1 1 presents the average number of bonds broken at the peak 
load and at failure per lattice configuration. 



4 Conclusions 

The main focus of the current paper is on developing an efficient algorithm based on 
iterative solvers for simulating large 3D fuse networks. Although the sparse direct 
solvers presented in [9] achieve superior performance over iterative solvers in 2D 
lattice systems, the available random access memory poses a severe constraint over 
the usage of sparse direct solvers for large 3D lattice systems due to the amount 
of fill-in during sparse Cholesky factorization. In this regard, the block-circulant 
preconditioner presented in the current paper is an effort toward efficiently solving 
large 3D fuse networks. 

Based on the numerical simulation results presented in Tables 1-5 (2D) and Tables 
7-10 (3D) for random threshold fuse model networks, it is clear that the block cir- 
culant preconditioned CG is superior to the optimal circulant preconditioned PCG 
solver, which in turn is superior to the Fourier accelerated PCG solvers. Further- 
more, in the case of large 3D lattice systems, the block-circulant preconditioner 
exhibits superior performance (for system sizes L > 32) over the sparse direct 
solvers and the related incomplete Cholesky preconditioned CG solvers. 

In addition, during the CG iterative solution, the preconditioned system using the 
block-circulant preconditioner is trivially parallel, and hence a parallel implemen- 
tation of the block-circulant precondioner can be employed to further speed up the 
solution of large 3D lattice systems. This allowed us to consider larger 3D lattice 
simulations, which will be a subject of future publication. 
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Fig. 1. Snapshots of damage in a typical cubic lattice system of size L = 48. Number 
of broken bonds at the peak load and at failure are 48904 and 54744, respectively, (a)-(i) 
represent the snapshots of damage after breaking n b number of bonds. The coloring scheme 
is such that in each snapshot, the bonds broken in the early stages are colored blue, then 
green, followed by yellow, and finally the last stage of broken bonds are colored red. (a) 
n b = 20000 (b) n b = 40000 (c) n b = 48904 (peak load) (d) n b = 51000 (e) n b = 52500 
(f) n b = 53500 (g) n b = 54000 (h) n b = 54500 (i) n b = 54744 (failure) 
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Fig. 2. Spanning cluster in a typical cubic lattice system of size L = 48. The coloring 
scheme is such that the bonds broken in the early stages are colored blue, then green, 
followed by yellow, and finally the last stage of broken bonds are colored red 
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Table 1 

Block Circulant PCG: 2D Triangular Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 


N C onfig 


32 


10.00 


10.68 


11597 


20000 


64 


135.9 


139.8 


41207 


1600 


128 


2818 


2846 


147510 


192 


256 


94717 


96500 




32 



Table 2 

Optimal Circulant PCG: 2D Triangular Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 


Nconfig 


32 


11.66 


12.26 


25469 


20000 


64 


173.6 


178.8 


120570 


1600 


128 


7473 


7725 


622140 


128 



Table 3 

Un-preconditioned CG: 2D Triangular Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 


N C onfig 


32 


7.667 


8.016 


66254 


20000 


64 


203.5 


205.7 


405510 


1600 



Table 4 

Incomplete Cholesky PCG: 2D Triangular Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 


-^config 


32 


2.831 


3.008 


5857 


20000 


64 


62.15 


65.61 


29496 


4000 


128 


1391 


1430 


148170 


320 



14 



Table 5 

Computational cost associated with solver type A of Ref. Q] 



Size 


CPU(sec) 


Wall(sec) 


N C onfig 


32 


0.592 


0.687 


20000 


64 


10.72 


11.26 


4000 


128 


212.2 


214.9 


800 


256 


5647 


5662 


96 


512 


93779 


96515 


16 



Table 6 

Number of broken bonds at peak and at failure 



L 


N C onfig 


Triangular 


n p (mean) 


n p (std) 


rif (mean) 


rif (std) 


4 


50000 


13 


3 


19 


3 


8 


50000 


41 


8 


54 


7 


16 


50000 


134 


19 


168 


18 


24 


50000 


276 


32 


335 


31 


32 


50000 


465 


48 


554 


46 


64 


50000 


1662 


130 


1911 


121 


128 


12000 


6068 


386 


6766 


349 


256 


1200 


22572 


1151 


24474 


1046 



Table 7 

Block Circulant PCG: 3D Cubic Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 


-Nconfig 


10 


16.54 


16.99 


16168 


40000 


16 


304.6 


308.5 


58756 


1920 


24 


2154 


2216 


180204 


256 


32 


12716 


12937 


403459 


128 


48 


130522 


133063 


1253331 


32 



15 



Table 8 

Optimal Circulant PCG: 3D Cubic Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 




10 


15.71 


16.10 


27799 


40000 


16 


386.6 


391.1 


121431 


1920 


24 


2488 


2548 


446831 


256 


32 


20127 


20380 


1142861 


32 


48 


233887 


237571 


4335720 


32 



Table 9 

Un-preconditioned CG: 3D Cubic Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 


-^config 


10 


5.962 


6.250 


48417 


40000 


16 


119.4 


123.0 


246072 


3840 


24 


1923 


1982 


1030158 


256 


32 


16008 


16206 


2868193 


64 



Table 10 

Incomplete Cholesky PCG: 3D Cubic Lattice 



Size 


CPU(sec) 


Wall(sec) 


Iterations 




10 


5.027 


5.262 


8236 


40000 


16 


118.1 


122.3 


42517 


3840 


24 


1659 


1705 


152800 


512 


32 


12091 


12366 


422113 


64 



Table 11 

Number of broken bonds at peak and at failure 



L 


Nconfig 


Cubic 


n p (mean) 


n p (std) 


nf (mean) 


nf (std) 


10 


40000 


563 


57 


726 


59 


16 


3840 


2108 


147 


2572 


152 


24 


512 


6692 


354 


7882 


337 


32 


128 


15329 


705 


17691 


649 


48 


32 


49495 


1582 


55768 


1523 



16 



